Combining Trigram and Winnow in Thai OCR Error Correction

نویسندگان

  • Surapant Meknavin
  • Boonserm Kijsirikul
  • Ananlada Chotimongkol
  • Cholwich Nattee
چکیده

This paper presents a survey on computer and the internet usage in both governmental and private universities. Questionnaires were sent to the university administrators in the levels of department head, office head, associate dean and dean of 24 governmental universities and of 15 private universities. 46.7% of these questionnaires have been returned. The results of the survey show that both governmental and private universities have highly supported their personnel and students to utilize computer and the internet. There have been encouragement to increase the number of available computers both at the universities and at home. The three major applications of computers and the internet in universities are registration and grade verification, instructing, and library systems. To the instructors, the important internet activities are email, reading news, and researching for class materials. Problem sited on the use of the internet are network malfunctioning, slow data transfer, and inadequate number of available computers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Sentence Break Disambiguation for Thai

Unlike English, there is no explicit sentence marker in Thai language. Conventionally, a space is placed at the end of the sentence when written in Thai. But it does not mean that a space always indicates the sentence boundary. In this paper, we propose the algorithm, which is a feature-based approach, to extract sentences from a paragraph by detecting the appropriate sentence breaking spaces. ...

متن کامل

Statistics and Phonotactical Rules in Finding OCR Errors

This report describes two experiments in finding errors in optically scanned Swedish without lexicon. First, statistics were used to find unexpectedly frequent trigrams and correction rules were created. Second, Bengt Sigurds model of Swedish phonotax was used to detect words with phonotactically illegal beginning or end. The phonotax did not perform as well as the statictic rules did on their ...

متن کامل

Thai spelling analysis for automatic spelling speech recognition

Spelling speech recognition can be applied for several purposes including enhancement of speech recognition systems and implementation of name retrieval systems. This paper presents a Thai spelling analysis to develop a Thai spelling speech recognizer. The Thai phonetic characteristics, alphabet system and spelling methods have been analyzed. As a training resource, two alternative corpora, a s...

متن کامل

Comparing Winnow and RIPPER in Thai Named-Entity Identification

This paper presents an application of two machine learning algorithms, i.e., Winnow and RIPPER, and their comparison on the task of Thai named-entity identification. While most of previous works on this task are based on handcoded rules, we use learning algorithms to help automate the development of named-entity system. Since Thai language has no explicit word boundary, Thai name is much more d...

متن کامل

Automated Error Detection in Digitized Cultural Heritage Documents

The work reported in this paper aims at performance optimization in the digitization of documents pertaining to the cultural heritage domain. A hybrid method is proposed, combining statistical classification algorithms and linguistic knowledge to automatize post-OCR error detection and correction. The current paper deals with the integration of linguistic modules and their impact on error

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998